Goto

Collaborating Authors

 decision tree model


SensHRPS: Sensing Comfortable Human-Robot Proxemics and Personal Space With Eye-Tracking

Kushina, Nadezhda, Watanabe, Ko, Kannan, Aarthi, Ashok, Ashita, Dengel, Andreas, Berns, Karsten

arXiv.org Artificial Intelligence

Social robots must adjust to human proxemic norms to ensure user comfort and engagement. While prior research demonstrates that eye-tracking features reliably estimate comfort in human-human interactions, their applicability to interactions with humanoid robots remains unexplored. In this study, we investigate user comfort with the robot "Ameca" across four experimentally controlled distances (0.5 m to 2.0 m) using mobile eye-tracking and subjective reporting (N=19). We evaluate multiple machine learning and deep learning models to estimate comfort based on gaze features. Contrary to previous human-human studies where Transformer models excelled, a Decision Tree classifier achieved the highest performance (F1-score = 0.73), with minimum pupil diameter identified as the most critical predictor. These findings suggest that physiological comfort thresholds in human-robot interaction differ from human-human dynamics and can be effectively modeled using interpretable logic.


HausaMovieReview: A Benchmark Dataset for Sentiment Analysis in Low-Resource African Language

Zanga, Asiya Ibrahim, Abdulrahman, Salisu Mamman, Ado, Abubakar, Bichi, Abdulkadir Abubakar, Jibril, Lukman Aliyu, Umar, Abdulmajid Babangida, Adamu, Alhassan, Muhammad, Shamsuddeen Hassan, Abubakar, Bashir Salisu

arXiv.org Artificial Intelligence

The development of Natural Language Processing (NLP) tools for low-resource languages is critically hindered by the scarcity of annotated datasets. This paper addresses this fundamental challenge by introducing HausaMovieReview, a novel benchmark dataset comprising 5,000 YouTube comments in Hausa and code-switched English. The dataset was meticulously annotated by three independent annotators, demonstrating a robust agreement with a Fleiss' Kappa score of 0.85 between annotators. We used this dataset to conduct a comparative analysis of classical models (Logistic Regression, Decision Tree, K-Nearest Neighbors) and fine-tuned transformer models (BERT and RoBERTa). Our results reveal a key finding: the Decision Tree classifier, with an accuracy and F1-score 89.72% and 89.60% respectively, significantly outperformed the deep learning models. Our findings also provide a robust baseline, demonstrating that effective feature engineering can enable classical models to achieve state-of-the-art performance in low-resource contexts, thereby laying a solid foundation for future research. Keywords: Hausa, Kannywood, Low-Resource Languages, NLP, Sentiment Analysis


A Human-In-The-Loop Approach for Improving Fairness in Predictive Business Process Monitoring

Käppel, Martin, Neuberger, Julian, Möhrlein, Felix, Weinzierl, Sven, Matzner, Martin, Jablonski, Stefan

arXiv.org Artificial Intelligence

Predictive process monitoring enables organizations to proactively react and intervene in running instances of a business process. Given an incomplete process instance, predictions about the outcome, next activity, or remaining time are created. This is done by powerful machine learning models, which have shown impressive predictive performance. However, the data-driven nature of these models makes them susceptible to finding unfair, biased, or unethical patterns in the data. Such patterns lead to biased predictions based on so-called sensitive attributes, such as the gender or age of process participants. Previous work has identified this problem and offered solutions that mitigate biases by removing sensitive attributes entirely from the process instance. However, sensitive attributes can be used both fairly and unfairly in the same process instance. For example, during a medical process, treatment decisions could be based on gender, while the decision to accept a patient should not be based on gender. This paper proposes a novel, model-agnostic approach for identifying and rectifying biased decisions in predictive business process monitoring models, even when the same sensitive attribute is used both fairly and unfairly. The proposed approach uses a human-in-the-loop approach to differentiate between fair and unfair decisions through simple alterations on a decision tree model distilled from the original prediction model. Our results show that the proposed approach achieves a promising tradeoff between fairness and accuracy in the presence of biased data. All source code and data are publicly available at https://doi.org/10.5281/zenodo.15387576.


Machines Learn Number Fields, But How? The Case of Galois Groups

Lee, Kyu-Hwan, Lee, Seewoo

arXiv.org Artificial Intelligence

By applying interpretable machine learning methods such as decision trees, we study how simple models can classify the Galois groups of Galois extensions over $\mathbb{Q}$ of degrees 4, 6, 8, 9, and 10, using Dedekind zeta coefficients. Our interpretation of the machine learning results allows us to understand how the distribution of zeta coefficients depends on the Galois group, and to prove new criteria for classifying the Galois groups of these extensions. Combined with previous results, this work provides another example of a new paradigm in mathematical research driven by machine learning.


Can Explainable AI Assess Personalized Health Risks from Indoor Air Pollution?

Sarkar, Pritisha, Jala, Kushalava reddy, Saha, Mousumi

arXiv.org Artificial Intelligence

Acknowledging the effects of outdoor air pollution, the literature inadequately addresses indoor air pollution's impacts. Despite daily health risks, existing research primarily focused on monitoring, lacking accuracy in pinpointing indoor pollution sources. In our research work, we thoroughly investigated the influence of indoor activities on pollution levels. A survey of 143 participants revealed limited awareness of indoor air pollution. Leveraging 65 days of diverse data encompassing activities like incense stick usage, indoor smoking, inadequately ventilated cooking, excessive AC usage, and accidental paper burning, we developed a comprehensive monitoring system. We identify pollutant sources and effects with high precision through clustering analysis and interpretability models (LIME and SHAP). Our method integrates Decision Trees, Random Forest, Naive Bayes, and SVM models, excelling at 99.8% accuracy with Decision Trees. Continuous 24-hour data allows personalized assessments for targeted pollution reduction strategies, achieving 91% accuracy in predicting activities and pollution exposure.


Randomized-Grid Search for Hyperparameter Tuning in Decision Tree Model to Improve Performance of Cardiovascular Disease Classification

Pathak, Abhay Kumar, Chaubey, Mrityunjay, Gupta, Manjari

arXiv.org Artificial Intelligence

Cardiovascular disease refers to any critical condition that impacts the heart. Because heart diseases can be life-threatening. Researchers are focusing on designing smart systems to accurately diagnose them based on electronic health data, with the aid of machine learning algorithms. Heart disease classification using machine learning (ML) algorithms such as Support Vector Machine(SVM), Na\"ive Bayes(NB), Decision Trees (DTs) and Random Forests (RFs) are often hindered by overfitting. These ML algorithms need extensive hyperparameter tuning. Random Search offers a faster, and, more efficient exploration of hyperparameter space, but, it may overlook optimal regions. Grid Search, though exhaustive, but, it is computationally expensive and inefficient, particularly with high-dimensional data. To address these limitations, Randomized-Grid Search, a novel hybrid optimization method is proposed that combines the global exploration strengths of Random Search with the focused, and, exhaustive search of Grid Search in the most promising regions. This hybrid approach efficiently balances exploration and exploitation. The proposed model optimizes the hyperparameter for Decision Tree model. The proposed model is applied to UCI heart disease dataset for classification. It enhances model performance, provides improved accuracy, generalization, and computational efficiency. Experimental results demonstrate that Randomized-Grid Search outperforms traditional methods by significant margins. The proposed model provides a more effective solution for machine learning applications in healthcare diagnosis.


A Comparison of Baseline Models and a Transformer Network for SOC Prediction in Lithium-Ion Batteries

Aboueidah, Hadeel, Altahhan, Abdulrahman

arXiv.org Artificial Intelligence

Accurately predicting the state of charge of Lithium-ion batteries is essential to the performance of battery management systems of electric vehicles. One of the main reasons for the slow global adoption of electric cars is driving range anxiety. The ability of a battery management system to accurately estimate the state of charge can help alleviate this problem. In this paper, a comparison between data-driven state-of-charge estimation methods is conducted. The paper compares different neural network-based models and common regression models for SOC estimation. These models include several ablated transformer networks, a neural network, a lasso regression model, a linear regression model and a decision tree. Results of various experiments conducted on data obtained from natural driving cycles of the BMW i3 battery show that the decision tree outperformed all other models including the more complex transformer network with self-attention and positional encoding.


Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication

Yadav, Sanjali, Asgari, Bahar

arXiv.org Artificial Intelligence

Sparse matrix-matrix multiplication (SpGEMM) is a critical operation in numerous fields, including scientific computing, graph analytics, and deep learning. These applications exploit the sparsity of matrices to reduce storage and computational demands. However, the irregular structure of sparse matrices poses significant challenges for performance optimization. Traditional hardware accelerators are tailored for specific sparsity patterns with fixed dataflow schemes - inner, outer, and row-wise but often perform suboptimally when the actual sparsity deviates from these predetermined patterns. As the use of SpGEMM expands across various domains, each with distinct sparsity characteristics, the demand for hardware accelerators that can efficiently handle a range of sparsity patterns is increasing. This paper presents a machine learning based approach for adaptively selecting the most appropriate dataflow scheme for SpGEMM tasks with diverse sparsity patterns. By employing decision trees and deep reinforcement learning, we explore the potential of these techniques to surpass heuristic-based methods in identifying optimal dataflow schemes. We evaluate our models by comparing their performance with that of a heuristic, highlighting the strengths and weaknesses of each approach. Our findings suggest that using machine learning for dynamic dataflow selection in hardware accelerators can provide upto 28 times gains.


Predicting Accident Severity: An Analysis Of Factors Affecting Accident Severity Using Random Forest Model

Adefabi, Adekunle, Olisah, Somtobe, Obunadike, Callistus, Oyetubo, Oluwatosin, Taiwo, Esther, Tella, Edward

arXiv.org Artificial Intelligence

Road accidents have significant economic and societal costs, with a small number of severe accidents accounting for a large portion of these costs. Predicting accident severity can help in the proactive approach to road safety by identifying potential unsafe road conditions and taking well-informed actions to reduce the number of severe accidents. This study investigates the effectiveness of the Random Forest machine learning algorithm for predicting the severity of an accident. The model is trained on a dataset of accident records from a large metropolitan area and evaluated using various metrics. Hyperparameters and feature selection are optimized to improve the model's performance. The results show that the Random Forest model is an effective tool for predicting accident severity with an accuracy of over 80%. The study also identifies the top six most important variables in the model, which include wind speed, pressure, humidity, visibility, clear conditions, and cloud cover. The fitted model has an Area Under the Curve of 80%, a recall of 79.2%, a precision of 97.1%, and an F1 score of 87.3%. These results suggest that the proposed model has higher performance in explaining the target variable, which is the accident severity class. Overall, the study provides evidence that the Random Forest model is a viable and reliable tool for predicting accident severity and can be used to help reduce the number of fatalities and injuries due to road accidents in the United States


Mapping Computer Science Research: Trends, Influences, and Predictions

Almutairi, Mohammed, Oguine, Ozioma Collins

arXiv.org Artificial Intelligence

This paper explores the current trending research areas in the field of Computer Science (CS) and investigates the factors contributing to their emergence. Leveraging a comprehensive dataset comprising papers, citations, and funding information, we employ advanced machine learning techniques, including Decision Tree and Logistic Regression models, to predict trending research areas. Our analysis reveals that the number of references cited in research papers (Reference Count) plays a pivotal role in determining trending research areas making reference counts the most relevant factor that drives trend in the CS field. Additionally, the influence of NSF grants and patents on trending topics has increased over time. The Logistic Regression model outperforms the Decision Tree model in predicting trends, exhibiting higher accuracy, precision, recall, and F1 score. By surpassing a random guess baseline, our data-driven approach demonstrates higher accuracy and efficacy in identifying trending research areas. The results offer valuable insights into the trending research areas, providing researchers and institutions with a data-driven foundation for decision-making and future research direction.